{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[&larr; Back to Index](index.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unsupervised Instrument Classification Using K-Means "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read Audio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Retrieve an audio file, load it into an array, and listen to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "urllib.urlretrieve?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "librosa.load?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "IPython.display.Audio?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detect Onsets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Detect onsets in the audio signal:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "librosa.onset.onset_detect?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Convert the onsets from units of frames to seconds (and samples):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "librosa.frames_to_time?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "librosa.frames_to_samples?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Listen to detected onsets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "mir_eval.sonify.clicks?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "IPython.display.Audio?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [librosa API reference](http://bmcfee.github.io/librosa/index.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, define which features to extract:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def extract_features(x, fs):\n",
    "    feature_1 = librosa.zero_crossings(x).sum() # placeholder\n",
    "    feature_2 = 0 # placeholder\n",
    "    return [feature_1, feature_2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each onset, extract a feature vector from the signal:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Assumptions:\n",
    "# x: input audio signal\n",
    "# fs: sampling frequency\n",
    "# onset_samples: onsets in units of samples\n",
    "frame_sz = fs*0.100\n",
    "features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scale Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `sklearn.preprocessing.MinMaxScaler` to scale your features to be within `[-1, 1]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sklearn.preprocessing.MinMaxScaler?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sklearn.preprocessing.MinMaxScaler.fit_transform?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `scatter` to plot features on a 2-D plane. (Choose two features at a time.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.scatter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cluster Using K-Means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `KMeans` to cluster your features and compute labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sklearn.cluster.KMeans?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sklearn.cluster.KMeans.fit_predict?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot Features by Class Label"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `scatter`, but this time choose a different marker color (or type) for each class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.scatter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Listen to Click Track"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a beep for each onset within a class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "IPython.display.Audio?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Listen to Clustered Frames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `concatenated_segments` function from the [feature sonification exercise](feature_sonification.html) to concatenate frames from the same cluster into one signal. Then listen to the signal. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def concatenate_segments(segments, fs=44100, pad_time=0.300):\n",
    "    padded_segments = [numpy.concatenate([segment, numpy.zeros(int(pad_time*fs))]) for segment in segments]\n",
    "    return numpy.concatenate(padded_segments)\n",
    "concatenated_signal = concatenate_segments(segments, fs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare across separate classes. What do you hear?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## For Further Exploration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use a different number of clusters in `KMeans`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use a different initialization method in `KMeans`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use different features. Compare tonal features against timbral features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "librosa.feature?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use different audio files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#filename = '1_bar_funk_groove.mp3'\n",
    "#filename = '58bpm.wav'\n",
    "#filename = '125_bounce.wav'\n",
    "#filename = 'prelude_cmaj_10s.wav'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[&larr; Back to Index](index.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
